PClass: Protein Quaternary Structure Classification by Using Bootstrapping Strategy as Model Selection

نویسندگان

  • Chi-Chou Huang
  • Chi-Chang Chang
  • Chi-Wei Chen
  • Shao-Yu Ho
  • Hsung-Pin Chang
  • Yen-Wei Chu
چکیده

Protein quaternary structure complex is also known as a multimer, which plays an important role in a cell. The dimer structure of transcription factors is involved in gene regulation, but the trimer structure of virus-infection-associated glycoproteins is related to the human immunodeficiency virus. The classification of the protein quaternary structure complex for the post-genome era of proteomics research will be of great help. Classification systems among protein quaternary structures have not been widely developed. Therefore, we designed the architecture of a two-layer machine learning technique in this study, and developed the classification system PClass. The protein quaternary structure of the complex is divided into five categories, namely, monomer, dimer, trimer, tetramer, and other subunit classes. In the framework of the bootstrap method with a support vector machine, we propose a new model selection method. Each type of complex is classified based on sequences, entropy, and accessible surface area, thereby generating a plurality of feature modules. Subsequently, the optimal model of effectiveness is selected as each kind of complex feature module. In this stage, the optimal performance can reach as high as 70% of Matthews correlation coefficient (MCC). The second layer of construction combines the first-layer module to integrate mechanisms and the use of six machine learning methods to improve the prediction performance. This system can be improved over 10% in MCC. Finally, we analyzed the performance of our classification system using transcription factors in dimer structure and virus-infection-associated glycoprotein in trimer structure. PClass is available via a web interface at http://predictor.nchu.edu.tw/PClass/.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Incremental Classifier from Data Streams

a novel evolving fuzzy rule-based classifier, namely parsimonious classifier (pClass), is proposed in this paper. pClass can set off its learning process either from scratch with an empty rule base or from an initially trained fuzzy model. Importantly, pClass not only adopts the open structure concept, where an automatic knowledge building process can be cultivated during the training process, ...

متن کامل

An Integrated DEA and Data Mining Approach for Performance Assessment

This paper presents a data envelopment analysis (DEA) model combined with Bootstrapping to assess performance of one of the Data mining Algorithms. We applied a two-step process for performance productivity analysis of insurance branches within a case study. First, using a DEA model, the study analyzes the productivity of eighteen decision-making units (DMUs). Using a Malmquist index, DEA deter...

متن کامل

Negative Selection Based Data Classification with Flexible Boundaries

One of the most important artificial immune algorithms is negative selection algorithm, which is an anomaly detection and pattern recognition technique; however, recent research has shown the successful application of this algorithm in data classification. Most of the negative selection methods consider deterministic boundaries to distinguish between self and non-self-spaces. In this paper, two...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2018